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"Burkholderia sprentiae" strain WSM5005 1 is an aerobic, motile, Gram-negative, non-spore- 
forming rod that was isolated in Australia from an effective N 2 -fixing root nodule of Lebeckia 
ambigua collected in Klawer, Western Cape of South Africa, in October 2007. Here we de- 
scribe the features of "Burkholderia sprentiae" strain WSM5005 1 , together with the genome 
sequence and its annotation. The 7,761,063 bp high-quality -draft genome is arranged in 8 
scaffolds of 236 contigs, contains 7,147 protein-coding genes and 76 RNA-only encoding 
genes, and is one of 20 rhizobial genomes sequenced as part of the DOE Joint Genome Insti- 
tute 2010 Community Sequencing Program. 



Introduction 

Legumes of the Fabaceae family of flowering 
plants have the unique capacity to form a symbi- 
otic N2-fixing symbiosis with soil-inhabiting root 
nodule bacteria (RNB). This symbiosis supplies 
leguminous species with the essential bioavailable 
nitrogen that could otherwise not be obtained 
from soils that are inherently infertile. The agri- 
cultural region of south-west Western Australia 
contains such impoverished soils and the success- 
ful establishment of effective legume-RNB symbi- 
oses has been exploited to drive plant and animal 
productivity in this landscape without the reliance 
on nitrogenous fertilizer [1,2]. This landscape's 
rainfall patterns appear to be changing, from a dry 
Mediterranean-type distribution to a generally 
reduced annual rainfall with a less predictable dis- 
tribution [3]. Due to changes in rainfall patterns, 
the reproduction of the commercially used annual 
legume species is challenged. Perennial species 
might be more able to adapt to climate change, 



though few commercial perennial forage legumes 
are adapted to the acid and infertile soils encoun- 
tered in the region [2]. Therefore, deep-rooted 
herbaceous perennial legumes including 
Rhynchosia and Lebeckia species adapted to acid 
and infertile soils have been investigated for use 
in this Australian agricultural setting [2,4,5]. The 
genus Lebeckia Thunb. is part of the Crotalarieae 
tribe, and refers to a group of 33 species of 
papilionoid legumes that are endemic to the 
southern and western parts of South Africa, which 
have similar soil and climate conditions to West- 
ern Australia [6,7]. This genus has recently been 
revised and is now subdivided into several sec- 
tions, including Lebeckia s.s., Calobota and 
Wiborgiella [7]. The Lebeckia s.s. section, which 
includes L. ambigua, can easily be distinguished 
from other species by their acicular leaves and 
5+5 anther arrangement [7-9]. 
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In four expeditions to the Western Cape of South 
Africa, between 2002 and 2007, nodules and seeds 
of Lebeckia ambigua were collected and stored [5]. 
The isolation of RNB from these nodules gave rise 
to a collection of 23 microsymbionts that clus- 
tered into five groups within the genus 
Burkholderia [5]. Unlike most of the previously 
studied rhizobial Burkholderia strains, this South 
African group appears to be associated with 
papilionoid forage legumes (rather than Mimosa 
spp.)- One of these Burkholderia strains has now 
been designated as the type strain of the new spe- 
cies "Burkholderia sprentiae" strain WSM5005 T 
[10]. This isolate effectively nodulates Lebeckia 
ambigua and L. sepiaria [5]. Here we present a 
summary classification and a set of general fea- 
tures for "Burkholderia sprentiae" strain 
WSM5005 T together with the description of the 
complete genome sequence and its annotation. 

Classification and general features 

"Burkholderia sprentiae" strain WSM5005 T is a 
motile, Gram-negative, non-spore-forming rod 
(Figure 1, left and center panels) in the order 
Burkholderiales of the class Betaproteobacteria 
[10]. It is fast growing, forming 2-4 mm diameter 
colonies within 2-3 days when grown on half Lu- 
pin Agar (y 2 LA) [11] at 28°C. Colonies on VzLA are 
white-opaque, slightly domed, moderately mucoid 
with smooth margins (Figure 1, right panel). 

Minimum Information about the Genome Se- 
quence (MIGS) is provided in Table 1. Figure 2 
shows the phylogenetic relationship of 
"Burkholderia sprentiae" strain WSM5005 T in a 
16S rRNA sequence based tree. This strain clus- 
ters closest to Burkholderia tuberum STM678 T 



(CIP 108238 T ) and Burkholderia kururiensis 
KP23 T with 98.2% and 96.9% sequence identity, 
respectively. 

Symbiotaxonomy 

"Burkholderia sprentiae" strain WSM5005 T is part 
of a cadre of Burkholderia strains that were as- 
sessed for nodulation and nitrogen fixation on 
three separate L. ambigua genotypes (CRSLAM-37, 
CRSLAM-39 and CRSLAM-41) and on L. sepiaria 
[5]. Representatives of this group of nodule bacte- 
ria are generally Nod + and Fix- on Macroptillium 
atropurpureum and appear to have a very narrow 
host range for symbiosis. They belong to a group 
of Burkholderia strains that nodulate papilionoid 
forage legumes rather than the classical 
Burkholderia hosts Mimosa spp. [Mimosoideae) 
[28]. 

Genome sequencing and annotation 
information 

Genome project history 

This organism was selected for sequencing on the 
basis of its environmental and agricultural rele- 
vance to issues in global carbon cycling, alterna- 
tive energy production, and biogeochemical im- 
portance, and is part of the Community Sequenc- 
ing Program at the U.S. Department of Energy, 
Joint Genome Institute (JGI) for projects of rele- 
vance to agency missions. The genome project is 
deposited in the Genomes OnLine Database [27] 
and an improved-high-quality-draft genome se- 
quence in IMG. Sequencing, finishing and annota- 
tion were performed by the JGI. A summary of the 
project information is shown in Table 2. 




Figure 1. Images of "Burkholderia sprentiae" strain WSM5005 1 using scanning (Left) and transmission (Cen- 
ter) electron microscopy and the colony morphology on a solid medium (Right). 
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Table 1. Classification and general features of "Burkholderia sprentiae" strain WSM5005 1 accord- 
ing to the MIGS recommendations [12,1 3]. 



MIGS ID 


Property 


Term 


Evidence code 






Domain Bacteria 


TAS [1 3] 






Phylum Proteobacteria 


TAS [14] 




Current classification 


Class Betaproteobacteria 

Order Burkholderiales 

F amily Burkholderiaceae 

Genus Burkholderia 

Species "Burkholderia sprentiae" 


TAS [15,16] 
TAS [15,1 7] 
TAS [15,18] 
TAS [1 9-2 1 ] 
TAS [10] 




Gram stain 


Negative 


IDA [22] 




Cell shape 


Rod 


IDA 




Motility 


Motile 


IDA 




Sporulation 


Non-sporulating 


IDA [22] 




Temperature range 


Mesophile 


IDA [22] 




Optimum temperature 


28°C 


IDA 




Salinity 


Not reported 




MIGS-22 


Oxygen requirement 
Carbon source 


Aerobic 
Not reported 


IDA 




Energy source 


Chemoorganotroph 


IDA [22] 


MIGS-6 


Habitat 


Soil, root nodule on host 


IDA 


MIGS-15 


Biotic relationship 


Free living, symbiotic 


IDA 


MIGS-14 


Pathogenicity 


Non-pathogenic 


NAS 




Biosafety level 


1 


TAS [23] 




Isolation 


Root nodule 


IDA 


MIGS-4 


Geographic location 


South Africa 


IDA 


MIGS-5 


Nodule collection date 


October, 2007 


IDA 


MIGS-4.1 


Longitude 


18.6211 11 


IDA 


MIGS-4.2 


Latitude 


-31.799722 


IDA 


MIGS-4.3 


Depth 


Not recorded 




MIGS-4.4 


Altitude 


Not recorded 





Evidence codes - IDA: Inferred from Direct Assay; TAS: Traceable Author Statement (i.e., a direct 
report exists in the literature); NAS: Non-traceable Author Statement (i.e., not directly observed for 
the living, isolated sample, but based on a generally accepted property for the species, or anecdo- 
tal evidence). These evidence codes are from the Gene Ontology project [24]. 
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- Burkholderia heleia SA41 1 (AB4951 23) 

Burkholderia silvatlantica SRMrh-2<F(AY965240, Gi05515) 



- Burkholderia ferrariae FeGI01 T (DQ51 4537) 

- Burkholderia oxyphila OX-01 T (AB488693) 

- Burkholderia sacc/7a/7CCT6771 T (AF263278) 



- Burkholderia mimosarum LMG 23256 T (AY752958, Gi08823) 



- Burkholderia nodosa Br3437' (AY7731 89) 



- Burkholderia unamaeMTI-641 T (AY221956, Gi01839) 

- Burkholderia bannensis E25 T (AB561 874) 



- Burkholderia tropica Ppe8 (AJ420332) 



- Burkholderia kururiensis KP23 (AB02431 0) 



- Burkholderia sprentiae WSM5005 (HF549035, Gi06497) 

- Burkholderia tuberumClP 108238 T (EU024149, Gi01837) 



■ Burkholderia p^ymafumSTM815 T (AJ302312, Gc00775)* 



Burkholderia sabiae Br3407' (AY7731 86) 
r- Burkholderia caribensis CIP 106784 T (EU024153) 

- | Burkholderia caribensis MWAP64 T (Y17009) 

6 g|j— Burkholderia hospita LMG 20598 T (AY040365) 
8il — Burkholderia terrae KMY02 7 (AB201 285) 

Burkholderia graminis C4D1 M T (U96939, Gi01 588) 
- Burkholderia phenoliruptrix AC1 1 00 T (AY43521 3) 



Burkholderia terricola LMG 20594' (AY040362) 

I Burkholderia caledonica LMG 19076 T (AF2 15704) 

' Burkholderia phytofirmans Psjtf (AY497470, Gc00795)* 

Burkholderia fungorum LMG 1 6225 1 (AF21 5705) 
— Burkholderia megapolitana LMG 23650 T (AM489502) 

Burkholderia xenovorans LB400 T (U86373, Gc00365)* 

59 | Burkholderia phenazinium LMG2247 T (U96936) 

I Burkholderia sartisoli RP007 T (AF061 872) 

Burkholderia ginsengisoli KMY03 T (AB201 286) 
Burkholderia bryophila LMG 23644 T (AM489501 ) 
- Burkholderia sediminicola HU2-65W T (EU03561 3) 



Burkholderia symbiotica NKMU-JPY345 T (HM357233) 

- Burkholderia so//GP25-8 T (DQ465451 ) 

Burkholderia caryophylli ATCC 254 18 T (AB02 1423) 



99j— ewr/c/io/der/ama//e/ATCC23344 T (AF110188, Gc00216)* 

I Burkholderia pseudomallei ATCC 23343 T (DQ1 08392) 
- Burkholderia thailandensis E264 T (U86373) 
99 [ Burkholderia coco venenans ATCC 33664 T (AB021 389) 

I Burkholderia gladioliC\P 1 0541 0 T (EU0241 68) 

|— Burkholderia arboris R-24201 T (AM747630) 

E D -rkholderia cepacia ATCC 25416 T (EU024171, Gi03921) 
kholderia cenocepacia LMG 16656 T (AF 148556) 
Burkholderia multivorans ATCC BAA-247 T (Y18703, GI09014) 
Burkholderia g/atoe/LMGI 41 90 T (U96935) 
— Burkholderia sordidicola SNU 0201 23 T (AF51 2827) 

Cupriavidus basilensis CIP 1 06792 1 (EU0241 59) 
Cupriavidus oxalatica DSM 1 105 T (AF 155567) 



- Cupriavidus taiwanensis LMG 19424 T (EU024160, Gc00754)* 



Figure 2. Phylogenetic tree showing the relationships of "Burkholderia sprentiae" 
strain WSM5005 (shown in blue print) with some of the bacteria in the order 
Burkholderiales based on aligned sequences of the 16S rRNA gene (1,322 bp internal 
region). All sites were informative and there were no gap-containing sites. Phylogenet- 
ic analyses were performed using MEGA, version 5.05 [25]. The tree was built using 
the maximum likelihood method with the General Time Reversible model. Bootstrap 
analysis [2 6] with 500 replicates was performed to assess the support of the clusters. 
Type strainsare indicated with a superscript T. Strains witha genome sequencing pro- 
ject registered in GOLD [2 7] are in bold print and the GOLD ID is mentioned after the 
accession number. Published genomes are designated with an asterisk. 
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Table 2. Genome sequencing project information for "Burkholderia sprentiae" strain WSM5005 T 



MILo ID 


Property 


Term 


K A i r^c 0 1 
MILo-J I 


Finishing quality 


Improved high-quality draft 


K A I /^C 1 O 


Libraries used 


lllumina GAii shotgun and paired end 454 libraries 


MILo-zy 


Sequencing platforms 


lllumina HiSeqzUUU and 454 GS rLX 1 Itanium technologies 


Mlub-Jl .z 


Sequencing coverage 


8.4x 454 paired end, 300 x lllumina 


iv a i /^c on 


Assemblers 


VLLVL 1 1 .01 J, Newbler z . 5, phrap 4.Z4 


MIGS-32 


Gene calling methods 


Prodigal 1.4, GenePRIMP 




GOLD ID 


Gi06497 




GenBank ID 


AXBN01 000000 




Database: IMG 


2510065045 




Project relevance 


Symbiotic N fixation, agriculture 



Growth conditions and DNA isolation 

"Burkholderia sprentiae" strain WSM5005 T was 
grown to mid logarithmic phase in TY rich medi- 
um [29] on a gyratory shaker at 28°C. DNA was 
isolated from 60 mL of cells using a CTAB (Cetyl 
trimethyl ammonium bromide) bacterial genomic 
DNA isolation method [30]. 

Genome sequencing and assembly 

The genome of "Burkholderia sprentiae" strain 
WSM5005 T was sequenced at the Joint Genome In- 
stitute QGI) using a combination of lllumina [31] 
and 454 technologies [32]. An lllumina GAii shot- 
gun library which generated 76,247,610 reads to- 
taling 5,794.8 Mb, and a paired end 454 library 
with an average insert size of 13 kb which generat- 
ed 612,483 reads totaling 112.9 Mb of 454 data 
were generated for this genome. All general aspects 
of library construction and sequencing performed 
at the JGI can be found at [30]. The initial draft as- 
sembly contained 420 contigs in 8 scaffolds. The 
454 paired end data was assembled with Newbler, 
version 2.3. The Newbler consensus sequences 
were computationally shredded into 2 kb overlap- 
ping fake reads (shreds). lllumina sequencing data 
were assembled with VELVET, version 1.0.13 [33], 
and the consensus sequences were computational- 
ly shredded into 1.5 kb overlapping fake reads 
(shreds). We integrated the 454 Newbler consen- 
sus shreds, the lllumina VELVET consensus shreds 
and the read pairs in the 454 paired end library 
using parallel phrap, version SPS - 4.24 (High Per- 
formance Software, LLC). The software Consed [34- 
36] was used in the following finishing process, 
lllumina data was used to correct potential base 
errors and increase consensus quality using the 
software Polisher developed at JGI (Alia Lapidus, 
unpublished). Possible mis-assemblies were cor- 
rected using gapResolution (Cliff Han, un- 
published), Dupfinisher [37], or sequencing cloned 
bridging PCR fragments with subcloning. Gaps 



between contigs were closed by editing in Consed, 
by PCR and by Bubble PCR {]-¥ Cheng, un- 
published) primer walks. A total of 352 additional 
reactions were necessary to close gaps and to raise 
the quality of the finished sequence. The estimated 
genome size is 7.8 Mb and the final assembly is 
based on 65.2 Mb of 454 draft data which provides 
an average 8.4x coverage of the genome and 2,340 
Mb of lllumina draft data which provides an aver- 
age 300x coverage of the genome. 

Genome annotation 

Genes were identified using Prodigal [38] as part 
of the DOE-JGI Annotation pipeline [39], followed 
by a round of manual curation using the JGI 
GenePRIMP pipeline [40]. The predicted CDSs 
were translated and used to search the National 
Center for Biotechnology Information (NCBI) non- 
redundant database, UniProt, TIGRFam, Pfam, 
PRIAM, KEGG, COG, and InterPro databases. These 
data sources were combined to assert a product 
description for each predicted protein. Non- 
coding genes and miscellaneous features were 
predicted using tRNAscan-SE [41], RNAMMer [42], 
Rfam [43], TMHMM [44], and SignalP [45]. Addi- 
tional gene prediction analyses and functional an- 
notation were performed within the Integrated 
Microbial Genomes (IMG-ER) platform [46]. 

Genome properties 

The genome is 7,761,063 nucleotides with 63.18% 
GC content (Table 3) and comprised of 8 scaffolds 
of 236 contigs. From a total of 7,223 genes, 7,147 
were protein encoding and 76 RNA only encoding 
genes. Within the genome, 377 pseudogenes were 
also identified. The majority of genes (76.16%) 
were assigned a putative function whilst the re- 
maining genes were annotated as hypothetical. The 
distribution of genes into COGs functional catego- 
ries is presented in Table 4, Figure 3 and Figure 4. 
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Table 3. Genome Statistics for "Burkholderia sprentiae" strain WSM5005 T . 



Attribute 


Value 


% or Total 


Genome size (bp) 


7,761,063 


100 


1 — x k. 1 A 1 * " /I \ 

UNA coding region (bp) 


6,514,546 


83.94 


DNA G+C content (bp) 


4,903,51 1 


63.18 


Number of scaffolds 


8 




Number of contigs 


236 




Total genes 


7,223 


100 


RNA genes 


76 


1.05 


Protein-coding genes 


7,147 


98.95 


Genes with function prediction 


5,501 


76.1 6 


Genes assigned to COGs 


5,456 


75.54 


Genes assigned Pfam domains 


5,800 


80.30 


Genes with signal peptides 


687 


9.51 


Genes with transmembrane helices 


1,634 


22.62 


CRISP R repeats 


0 







111 Hi llllli 




[A] 
P] 
|[C] 
P] 
|[E] 
[F] 
f[GT 

AfUl J™ 

In 

[K] 
[L] 
|[M] 

[N] 
[0] 
[P] 

[Q] 
In 

|[S] 

In 

[U] 

[V] 
[W] 

|[Y] 
[Z] 
|XA 



COG Function Definition 

RNA processing and modification 
Chromatin structure and dynamics 
Energy production and conversion 



Cell cycle control, cell division, chromosome partitioning 

.Amino acid transport and metabolism 

Nucleotide transport and metabolism 

Carbohydrate transport and metabolism 

Coenzyme transport and metabolism 

Lipid transport and metabolism 

Translation, nbosomal structure and biogenesis 

Transcription 

Replication, recombination and repair 
Cell wall membrane envelope biogenesis 

Cell motility 

Posttranslahonal modification, protein turnover, chaperones 
Inorganic ion transport and metabolism 
Secondary metabolites biosynthesis, transport and catabolism 
General function prediction only 
Function unknown 
Signal transduction mechanisms 
Intracellular trafficking, secretion, and vesicular transport 
Defense mechanisms 
Extracellular structures 
Nuclear structure 
Cytoskeleton 
|| Not assigned 



Figure 4. Color code for Figure 3. 



BWD.6 | | II | 



ll ll 



1MT 



1 1 ll ill i 



JlJi. 



Figure 3. Graphical map of the chromosome of 
"Burkholderia sprentiae" strain WSM5005 t . From 
the bottom to the top of each scaffold: Genes on 
forward strand (color by COG categories as denot- 
ed by the IMG platform), Genes on reverse strand 
(color by COG categories), RNA genes (tRNAs 
green, sRNAs red, other RNAs black), GC content, 
GC skew. 
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Table 4. Number of protein coding genes of "Burkholderia sprentiae" strain WSM5005 1 



associated with the general COG functional categories. 



Code 


Value 


%age 


Description 


J 


205 


3.34 


Translation, ribosomal structure and biogenesis 


A 


2 


0.03 


RNA processing and modification 


K 


566 


9.22 


Transcription 


L 


257 


4.18 


Replication, recombination and repair 


B 


1 


0.02 


Chromatin structure and dynamics 


D 


46 


0.75 


Cell cycle control, mitosis and meiosis 


Y 


0 


0.00 


Nuclear structure 


V 


70 


1.14 


Defense mechanisms 


T 


313 


5.10 


Signal transduction mechanisms 


M 


409 


6.66 


Cell wall/membrane biogenesis 


N 


114 


1.86 


Cell motility 


Z 


0 


0.00 


Cytoskeleton 


w 


2 


0.03 


Extracellular structures 


u 


154 


2.51 


Intracellular trafficking and secretion 


o 


185 


3.01 


Posttranslational modification, protein turnover, chaperones 


c 


442 


7.20 


Energy production conversion 


G 


486 


7.91 


Carbohydrate transport and metabolism 


E 


576 


9.38 


Amino acid transport metabolism 


F 


96 


1.56 


Nucleotide transport and metabolism 


H 


219 


3.57 


Coenzyme transport and metabolism 


1 


288 


4.69 


Lipid transport and metabolism 


P 


282 


4.59 


Inorganic ion transport and metabolism 


Q 


176 


2.87 


Secondary metabolite biosynthesis, transport and catabolism 


R 


738 


12.02 


General function prediction only 


S 


515 


8.38 


Function unknown 




1,767 


24.46 


Not in COGS 
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